Estimating bias and variance from data
نویسندگان
چکیده
The bias-variance decomposition of error provides useful insights into the error performance of a classifier as it is applied to different types of learning task. Most notably, it has been used to explain the extraordinary effectiveness of ensemble learning techniques. It is important that the research community have effective tools for assessing such explanations. To this end, techniques have been developed for estimating bias and variance from data. The most widely deployed of these uses repeated sub-sampling with a holdout set. We argue, with empirical support, that this approach has serious limitations. First, it provides very little flexibility in the types of distributions of training sets that may be studied. It requires that the training sets be relatively small and that the degree of variation between training sets be very circumscribed. Second, the approach leads to bias and variance estimates that have high statistical variance and hence low reliability. We develop an alternative method that is based on cross-validation. We show that this method allows far greater flexibility in the types of distribution that are examined and that the estimates derived are much more stable. Finally, we show that changing the distributions of training sets from which bias and variance estimates are drawn can alter substantially the bias and variance estimates that are derived.
منابع مشابه
Estimating Variance of the Sample Mean in Two-phase Sampling with Unit Non-response Effect
In sample surveys, we always deal with two types of errors: Sampling error and non-sampling error. One of the most common non-sampling errors is nonresponse. This error happens when some sample units are not observed or viewed but they do not answer some of the questions. The complete prevention of this error is not possible, but it can be significantly reduced. The non-response causes bias and ...
متن کاملBias and variance reduction in estimating the proportion of true-null hypotheses.
When testing a large number of hypotheses, estimating the proportion of true nulls, denoted by π(0), becomes increasingly important. This quantity has many applications in practice. For instance, a reliable estimate of π(0) can eliminate the conservative bias of the Benjamini-Hochberg procedure on controlling the false discovery rate. It is known that most methods in the literature for estimati...
متن کاملThe Effect of Nonresponse Primary Sampling Units on Estimating the Variance of Changes by Jackknife Method (Case Study: Labor Force Survey Data for 2009 and 2010)
Abstract. According to the importance of presenting change estimation of labor force survey indicators along with their variance, in this paper, the use of Jackknife method in estimating variance of changes has been investigated. Then, the effect of nonresponse primary sampling units on estimating the variance of changes has been studied by use of Jackknife method via intensive simulation stud...
متن کاملEstimating Suspended Sediment by Artificial Neural Network (ANN), Decision Trees (DT) and Sediment Rating Curve (SRC) Models (Case study: Lorestan Province, Iran)
The aim of this study was to estimate suspended sediment by the ANN model, DT with CART algorithm and different types of SRC, in ten stations from the Lorestan Province of Iran. The results showed that the accuracy of ANN with Levenberg-Marquardt back propagation algorithm is more than the two other models, especially in high discharges. Comparison of different intervals in models showed that r...
متن کاملEstimating the variance of survival rates and fecundities
Estimating the risk of extinction or decline requires estimates of the variability in vital rates, such as survival and fecundity. This paper describes a method for estimating variance of survivals and fecundities from census data. The method involves calculating an estimate of the variance in survival and fecundity due to demographic stochasticity and subtracting this estimate from an estimate...
متن کامل